geometric ergodicity
The data augmentation algorithm
Roy, Vivekananda, Khare, Kshitij, Hobert, James P.
The data augmentation (DA) algorithms are popular Markov chain Monte Carlo (MCMC) algorithms often used for sampling from intractable probability distributions. This review article comprehensively surveys DA MCMC algorithms, highlighting their theoretical foundations, methodological implementations, and diverse applications in frequentist and Bayesian statistics. The article discusses tools for studying the convergence properties of DA algorithms. Furthermore, it contains various strategies for accelerating the speed of convergence of the DA algorithms, different extensions of DA algorithms and outlines promising directions for future research. This paper aims to serve as a resource for researchers and practitioners seeking to leverage data augmentation techniques in MCMC algorithms by providing key insights and synthesizing recent developments.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- North America > United States > California > Alameda County > Hayward (0.04)
- Research Report (1.00)
- Overview (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Geometric ergodicity of SGLD via reflection coupling
Li, Lei, Liu, Jian-Guo, Wang, Yuliang
The Stochastic Gradient Langevin Dynamics (SGLD), first introduced by Welling and Teh [25], has attracted a lot of attention in various areas [18, 26, 4]. The SGLD algorithm and its variants have fantastic performance when dealing with many practical sampling or optimization tasks. Recent decades have witnessed great development of theoretical research for SGLD, where most researchers focus on its discretization error, namely, the "distance" between the SGLD algorithm and the corresponding Langevin diffusion in terms of the time step (or learning rate) η [12, 18, 26, 16]. The SGLD itself can be regarded as a stochastic process and the ergodicity is also of great importance. So far, the justification of the geometric ergodicity of SGLD mostly relies on the strong convexity conditions, namely, the strong log-concaveness of the target distribution. In [4], under strong convexity settings, the authors considered the Synchronous coupling and established the geometric ergodicity of SGLD and some other numerical schemes in terms of Wasserstein-2 distance. However, the strong convexity assumption seems to limit the applicability of the result, and the ergodicity of the SGLD algorithm in a general setting and the existence of an invariant measure are still unclear. In our work, we aim to study the geometric ergodicity under locally nonconvex setting in this paper. The main technique we apply is reflection coupling [8], which was originally designed earlier to study the contraction property of many continuous SDEs.
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Geometric Ergodicity in Modified Variations of Riemannian Manifold and Lagrangian Monte Carlo
Brofos, James A., Roy, Vivekananda, Lederman, Roy R.
Riemannian manifold Hamiltonian (RMHMC) and Lagrangian Monte Carlo (LMC) have emerged as powerful methods of Bayesian inference. Unlike Euclidean Hamiltonian Monte Carlo (EHMC) and the Metropolis-adjusted Langevin algorithm (MALA), the geometric ergodicity of these Riemannian algorithms has not been extensively studied. On the other hand, the manifold Metropolis-adjusted Langevin algorithm (MMALA) has recently been shown to exhibit geometric ergodicity under certain conditions. This work investigates the mixture of the LMC and RMHMC transition kernels with MMALA in order to equip the resulting method with an "inherited" geometric ergodicity theory. We motivate this mixture kernel based on an analogy between single-step HMC and MALA. We then proceed to evaluate the original and modified transition kernels on several benchmark Bayesian inference tasks.
- North America > United States > Iowa (0.04)
- North America > United States > Connecticut (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asynchronous and Distributed Data Augmentation for Massive Data Settings
Zhou, Jiayuan, Khare, Kshitij, Srivastava, Sanvesh
Data augmentation (DA) algorithms are widely used for Bayesian inference due to their simplicity. In massive data settings, however, DA algorithms are prohibitively slow because they pass through the full data in any iteration, imposing serious restrictions on their usage despite the advantages. Addressing this problem, we develop a framework for extending any DA that exploits asynchronous and distributed computing. The extended DA algorithm is indexed by a parameter $r \in (0, 1)$ and is called Asynchronous and Distributed (AD) DA with the original DA as its parent. Any ADDA starts by dividing the full data into $k$ smaller disjoint subsets and storing them on $k$ processes, which could be machines or processors. Every iteration of ADDA augments only an $r$-fraction of the $k$ data subsets with some positive probability and leaves the remaining $(1-r)$-fraction of the augmented data unchanged. The parameter draws are obtained using the $r$-fraction of new and $(1-r)$-fraction of old augmented data. For many choices of $k$ and $r$, the fractional updates of ADDA lead to a significant speed-up over the parent DA in massive data settings, and it reduces to the distributed version of its parent DA when $r=1$. We show that the ADDA Markov chain is Harris ergodic with the desired stationary distribution under mild conditions on the parent DA algorithm. We demonstrate the numerical advantages of the ADDA in three representative examples corresponding to different kinds of massive data settings encountered in applications. In all these examples, our DA generalization is significantly faster than its parent DA algorithm for all the choices of $k$ and $r$. We also establish geometric ergodicity of the ADDA Markov chain for all three examples, which in turn yields asymptotically valid standard errors for estimates of desired posterior quantities.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Iowa (0.04)
- Asia > Middle East > Jordan (0.04)
Analyzing Relevance Vector Machines using a single penalty approach
Dixit, Anand, Roy, Vivekananda
Relevance vector machine (RVM) is a popular sparse Bayesian learning model typically used for prediction. Recently it has been shown that improper priors assumed on multiple penalty parameters in RVM may lead to an improper posterior. Currently in the literature, the sufficient conditions for posterior propriety of RVM do not allow improper priors over the multiple penalty parameters. In this article, we propose a single penalty relevance vector machine (SPRVM) model in which multiple penalty parameters are replaced by a single penalty and we consider a semi Bayesian approach for fitting the SPRVM. The necessary and sufficient conditions for posterior propriety of SPRVM are more liberal than those of RVM and allow for several improper priors over the penalty parameter. Additionally, we also prove the geometric ergodicity of the Gibbs sampler used to analyze the SPRVM model and hence can estimate the asymptotic standard errors associated with the Monte Carlo estimate of the means of the posterior predictive distribution. Such a Monte Carlo standard error cannot be computed in the case of RVM, since the rate of convergence of the Gibbs sampler used to analyze RVM is not known. The predictive performance of RVM and SPRVM is compared by analyzing three real life datasets.
- North America > United States > Iowa (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
On the Theoretical Properties of the Exchange Algorithm
Exchange algorithm is one of the most popular extensions of Metropolis-Hastings algorithm to sample from doubly-intractable distributions. However, theoretical exploration of exchange algorithm is very limited. For example, natural questions like `Does exchange algorithm converge at a geometric rate?' or `Does the exchange algorithm admit a Central Limit Theorem?' have not been answered. In this paper, we study the theoretical properties of exchange algorithm, in terms of asymptotic variance and convergence speed. We compare the exchange algorithm with the original Metropolis-Hastings algorithm and provide both necessary and sufficient conditions for geometric ergodicity of the exchange algorithm, which can be applied to various practical applications such as exponential random graph models and Ising models. A central limit theorem for the exchange algorithm is also established. Meanwhile, a concrete example, involving the Binomial model with conjugate and non-conjugate priors, is treated in detail with sharp convergence rates. Our results justify the theoretical usefulness of the exchange algorithm.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.31)
Dimension-free convergence rates for gradient Langevin dynamics in RKHS
Muzellec, Boris, Sato, Kanji, Massias, Mathurin, Suzuki, Taiji
Gradient Langevin dynamics (GLD) and stochastic GLD (SGLD) have attracted considerable attention lately, as a way to provide convergence guarantees in a non-convex setting. However, the known rates grow exponentially with the dimension of the space. In this work, we provide a convergence analysis of GLD and SGLD when the optimization space is an infinite dimensional Hilbert space. More precisely, we derive non-asymptotic, dimension-free convergence rates for GLD/SGLD when performing regularized non-convex optimization in a reproducing kernel Hilbert space. Amongst others, the convergence analysis relies on the properties of a stochastic differential equation, its discrete time Galerkin approximation and the geometric ergodicity of the associated Markov chains.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (6 more...)
- Information Technology > Mathematics of Computing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)